Goto

Collaborating Authors

 structural break



A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models

Chen, Baolin, Ran, Mengfei

arXiv.org Machine Learning

In modern biomedical and econometric studies, longitudinal processes are often characterized by complex time-varying associations and abrupt regime shifts that are shared across correlated outcomes. Standard functional data analysis (FDA) methods, which prioritize smoothness, often fail to capture these dynamic structural features, particularly in high-dimensional settings. This article introduces Adaptive Joint Learning (AJL), a hierarchical regularization framework designed to integrate functional variable selection with structural changepoint detection in multivariate time-varying coefficient models. Unlike standard simultaneous estimation approaches, we propose a theoretically grounded two-stage screening-and-refinement procedure. This framework first synergizes adaptive group-wise penalization with sure screening principles to robustly identify active predictors, followed by a refined fused regularization step that effectively borrows strength across multiple outcomes to detect local regime shifts. We provide a rigorous theoretical analysis of the estimator in the ultra-high-dimensional regime (p >> n). Crucially, we establish the sure screening consistency of the first stage, which serves as the foundation for proving that the refined estimator achieves the oracle property-performing as well as if the true active set and changepoint locations were known a priori. A key theoretical contribution is the explicit handling of approximation bias via undersmoothing conditions to ensure valid asymptotic inference. The proposed method is validated through comprehensive simulations and an application to Sleep-EDF data, revealing novel dynamic patterns in physiological states.



ProteuS: A Generative Approach for Simulating Concept Drift in Financial Markets

Suárez-Cetrulo, Andrés L., Cervantes, Alejandro, Quintana, David

arXiv.org Artificial Intelligence

Financial markets are complex, non-stationary systems where the underlying data distributions can shift over time, a phenomenon known as regime changes, as well as concept drift in the machine learning literature. These shifts, often triggered by major economic events, pose a significant challenge for traditional statistical and machine learning models. A fundamental problem in developing and validating adaptive algorithms is the lack of a ground truth in real-world financial data, making it difficult to evaluate a model's ability to detect and recover from these drifts. This paper addresses this challenge by introducing a novel framework, named ProteuS, for generating semi-synthetic financial time series with pre-defined structural breaks. Our methodology involves fitting ARMA-GARCH models to real-world ETF data to capture distinct market regimes, and then simulating realistic, gradual, and abrupt transitions between them. The resulting datasets, which include a comprehensive set of technical indicators, provide a controlled environment with a known ground truth of regime changes. An analysis of the generated data confirms the complexity of the task, revealing significant overlap between the different market states. We aim to provide the research community with a tool for the rigorous evaluation of concept drift detection and adaptation mechanisms, paving the way for more robust financial forecasting models.


Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement

Kong, Yaxuan, Yang, Yiyuan, Hwang, Yoontae, Du, Wenjie, Zohren, Stefan, Wang, Zhangyang, Jin, Ming, Wen, Qingsong

arXiv.org Artificial Intelligence

Time series data are foundational in finance, healthcare, and energy domains. However, most existing methods and datasets remain focused on a narrow spectrum of tasks, such as forecasting or anomaly detection. To bridge this gap, we introduce Time Series Multi-Task Question Answering (Time-MQA), a unified framework that enables natural language queries across multiple time series tasks - numerical analytical tasks and open-ended question answering with reasoning. Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $\sim$200k question-answer pairs derived from diverse time series spanning environment, traffic, etc. This comprehensive resource covers various time series lengths and promotes robust model development. We further demonstrate how continually pre-training large language models (Mistral 7B, Llama-3 8B, and Qwen-2.5 7B) on the TSQA dataset enhanced time series reasoning capabilities, moving beyond mere numeric tasks and enabling more advanced and intuitive interactions with temporal data. The complete TSQA dataset, models, executable codes, user study questionnaires for evaluation, and results have all been open-sourced.


Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

Chen, Xi, Qin, Chuan, Fang, Chuyu, Wang, Chao, Zhu, Chen, Zhuang, Fuzhen, Zhu, Hengshu, Xiong, Hui

arXiv.org Artificial Intelligence

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research.


Detection and Estimation of Structural Breaks in High-Dimensional Functional Time Series

Li, Degui, Li, Runze, Shang, Han Lin

arXiv.org Machine Learning

Modelling functional time series, time series of random functions defined within a finite interval, has became one of the main frontiers of developments in time series models. Various functional linear and nonlinear time series models have been proposed and extensively studied in the past two decades (e.g., Bosq, 2000; Hörmann and Kokoszka, 2010; Horváth and Kokoszka, 2012; Hörmann, Horváth and Reeder, 2013; Li, Robinson and Shang, 2020). These models together with relevant methodologies have been applied to various fields such as biology, demography, economics, environmental science and finance. However, the model frameworks and methodologies developed in the aforementioned literature heavily rely on the stationarity assumption, which is often rejected when testing the functional time series data in practice. For example, Horváth, Kokoszka and Rice (2014) find evidence of nonstationarity for intraday price curves of some stocks collected in the US market; Aue, Rice and Sönmez (2018) reject the null hypothesis of stationarity for the temperature curves collected in Australia; and Li, Robinson and Shang (2023) reveal evidence of nonstationary feature for the functional time series constructed from the age-and sex-specific life-table death counts. It thus becomes imperative to test whether the collected functional time series are stationary. The primary interest of this paper is to test whether there exist structural breaks in the mean function over time and subsequently estimate locations of breaks if they do exist. There have been increasing interests on detecting and estimating structural breaks in functional time series. Broadly speaking, there are two types of detection techniques.


Equivalence relations and $L^p$ distances between time series

James, Nick, Menzies, Max

arXiv.org Machine Learning

We introduce a general framework for defining equivalence and measuring distances between time series, and a first concrete method for doing so. We prove the existence of equivalence relations on the space of time series, such that the quotient spaces can be equipped with a metrizable topology. We illustrate algorithmically how to calculate such distances among a collection of time series, and perform clustering analysis based on these distances. We apply these insights to analyse the recent bushfires in NSW, Australia. There, we introduce a new method to analyse time series in a cross-contextual setting.


Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

Schweikert, Karsten

arXiv.org Machine Learning

In this paper, we propose an adaptive group lasso procedure to efficiently estimate structural breaks in cointegrating regressions. It is well-known that the group lasso estimator is not simultaneously estimation consistent and model selection consistent in structural break settings. Hence, we use a first step group lasso estimation of a diverging number of breakpoint candidates to produce weights for a second adaptive group lasso estimation. We prove that parameter changes are estimated consistently by group lasso if it is tuned correctly and show that the number of estimated breaks is greater than the true number but still sufficiently close to it. Then, we use these results and prove that the adaptive group lasso has oracle properties if weights are obtained from our first step estimation and the tuning parameter satisfies some further restrictions. Simulation results show that the proposed estimator delivers the expected results. An economic application to the long-run US money demand function demonstrates the practical importance of this methodology.


Detecting stationarity in time series data

#artificialintelligence

Stationarity is an important concept in time series analysis. For a concise (but thorough) introduction to the topic, and the reasons that make it important, take a look at my previous blog post on the topic. As such, the ability to determine if a time series is stationary is important. Rather than deciding between two strict options, this usually means being able to ascertain, with high probability, that a series is generated by a stationary process. In this brief post, I will cover several ways to do just that.